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Abstract 

We study the journey planning problem in public transit networks. Developing efficiënt 
preprocessing-based speedup techniques for this problem has been challenging: current 
approaches either require massive preprocessing effort or provide limited speed ups. 
Leveraging recent ad vances in Hub Labeling, the fastest algorithm for road networks, 
we revisit the well-known time-expanded model for public transit. Exploiting domain- 
specific properties, we provide simple and efficiënt algorithms for the earliest arrival, 
profile, and multicriteria problems, with queries that are orders of magnitude faster 
than the state of the art. 


1 Introduction 

Recent research on route planning in transportation networks [5] has produced several 
speedup techniques varying in preprocessing time, space, query performance, and simplicity. 
Overall, queries on road networks are several orders of magnitude faster than on public 
transit jöj . Our aim is to reduce this gap. 

There are many natural query types in public transit. An earliest arrival query seeks 
a journey that arrivés at a target stop t as early as possible, given a source stop s and a 
departure time (e. g., “now”). A multicriteria query also considers the number of transfers 
when traveling from s to t. A profile query reports all quickest journeys between two stops 
within a time range. 

* An extended abstract of this paper lias been accepted at the 14th International Symposium on Experimental 
Algorithms (SEA’15). Work done mostly while all authors were at Microsoft Research Silicon Valley. 
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These problems can be approached by variants of Dijkstra’s algorithm 113] applied to 
a graph modeling the public transit network, with various techniques to handle time- 
dependency (Ï8]. In particular, the time-expanded (TE) graph encodes time in the vertices, 
creating a vertex for every event (e. g., a train departure or arrival at a stop at a specific time). 
Newer approaches, like CSA 112 and RAPTOR (TT] , work directly on the timetable. Speedup 
techniques |5 : such as Transfer Patterns 14,6], Timetable Contraction Hierarchies [Ï4 , and 
AC SA 120 ; use preprocessing to create auxiliary data that is then used to accelerate queries. 

For aperiodic timetables, the TE model yields a directed acyclic graph (DAG), and several 
public transit query problems translate to reachability problems. Although these can be 
solved by simple graph searches, this is too slow for our application. Different methodologies 
exist to enable faster reachability computation (T, 15,16,19,21 23 . In particular, the 
2-hop labeling |8| scheme associates with each vertex two labels (forward and backward); 
reachability (or shortest-path distance) can be determined by intersecting the source’s 
forward label and the target’s backward label. On Continental road networks, 2-hop labeling 
distance queries take less than a microsecond [2jj. 

In this work, we adapt 2-hop labeling to public transit networks, improving query 
performance by orders of magnitude over previous methods, while keeping preprocessing time 
practical. Starting from the time-expanded graph model (Section [3]) , we extend the labeling 
scheme by carefully exploiting properties of public transit networks (Section [4]). Besides 
earliest arrival and profile queries, we address multicriteria and location-to-location queries, 
as well as reporting the full journey description quickly (Section [ö]) . We validate our Public 
Transit Labeling (PTL) algorithm by careful experimental evaluation on large metropolitan 
and national transit networks (Section [ö]), achieving queries within microseconds. 


2 Preliminaries 

Let G = ( V , A) be a (weighted) directed graph , where V is the set of vertices and A the set 
of arcs. An are between two vertices u, v G V is denoted by (u, v). A path is a sequence of 
adjacent vertices. A vertex v is reachable from a vertex u if there is a path from u to u. A 
DAG is a graph that is both directed and acyclic. 

We consider aperiodic timetables, consisting of sets of stops S , events E , trips T, and 
footpaths F. Stops are distinct locations where one can board a transit vehicle (such 
as bus stops or subway platforms). Events are the scheduled departures and arrivals of 
vehicles. Each event e € E has an associated stop stop(e) and time time(e). Let E(jp) = 
{eo (p), ..., Gkpip)} be the hst (ordered by time) of events at a stop p. We set time(e,(p)) = 
—oo for i < 0, and time(ej(p)) = oo for i > k p . For simplicity, we may drop the index of an 
event (as in e(p) G E(p)) or its stop (as in e G E). A trip is a sequence of events served 
by the same vehicle. A pair of a consecutive departure and arrival events of a trip is a 
connection. Footpaths model transfers between nearby stops, each with a predetermined 
walking duration. 

A journey planning algorithm outputs a set of journeys. A journey is a sequence of trips 
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(each with a pair of pick-up and drop-off stops) and footpaths in the order of travel. Journeys 
can be measured according to several criteria, such as arrival time or number of transfers. A 
journey j\ dominates a journey j '2 if and only if j\ is no worse in any criterion tiran j 2 - If j\ 
and j '2 are equal in all criteria, we break ties arbitrarily. A set of non-dominated journeys is 
called a Pareto set. Multicriteria Pareto optimization is NP-hard in general, but practical 
for natural criteria in public transit networks 0 12, TtJ,|T 8] . A journey is tight if there is no 
other journey between the same source and target that dominates it in terrns of departure 
and arrival time, e.g., that départs later and arrivés earlier. 

Given a timetable, stops s and t, and a departure time r, the (s, t, r)-earliest arrival (EA) 
problem asks for an s-t journey that arrivés at t as early as possible and départs at s 
no earlier than r. The ( s,t)-profile problem asks for a Pareto set of all tight journeys 
between s and t over the entire timetable period. Finally, the (s,t,r)-multicriteria (MC) 
problem asks for a Pareto set of journeys departing at s no earlier than r and minimizing 
the criteria arrival time and number of transfers. We focus on computing the values of 
the associated optimization criteria of the journeys (i. e., departure time, arrival times, 
number of transfers), which is enough for many applications. Section [ö] discusses how the 
full journey description can be obtained with little overhead. 

Our algorithms are based on the 2-hop labeling scheme for directed graphs [8 . It 
associates with every vertex v a forward label Lj{y) and a backward label Lb(v). In a 
reachability labeling , labels are subsets of V, and vertices u £ Lf(v) U Lbiy) are hubs of v. 
Every hub in Lf(y) must be reachable from v, which in turn must be reachable by every 
hub in Lb(v). In addition, labels must obey the cover property. for any pair of vertices u 
and v , the intersection Lf(u ) fl Lbiy) must contain at least one hub on a u-v path (if it 
exists). It follows from this definition that Lf(u ) n Lb(v ) 7 ^ 0 if and only if v is reachable 
from u. 

In a shortest path labeling , each hub u £ L/(u) also keeps the associated distance dist(u, v ), 
or dist (v,u) for backward labels, and the cover property requires Lf(u) n Lb(v) to contain 
at least one hub on a shortest u-v path. If labels are kept sorted by hub ID, a distance label 
query efficiently computes dist(u, v) by a coordinated linear sweep over Lf(u) and Lb(v), 
finding the hub w £ Lf(u ) n Lbiy) that minimizes dist (it, w) + dist(u;,u). In contrast, a 
reachability label query can stop as soon as any matching hub is found. 

In general, smaller labels lead to less space and faster queries. Many algorithms to 
compute labelings have been proposed na 15][21,23|, often for restricted graph classes. 


We leverage (as a black box) the recent RXL algorithm 191, which efficiently computes small 
shortest path labelings for a variety of graph classes at scale. It is a sampling-based greedy 
algorithm that builds labels one hub at a time, with priority to vertices that cover as many 
relevant paths as possible. 

Different approaches for transforming a timetable into a graph exist (see 118] for an 
overview). In this work, we focus on the time-expanded model. Since it uses scalar are costs, 
it is a natural choice for adapting the labeling approach. In contrast, the time-dependent 
model (another popular approach) associates functions with the ares, which makes adaption 
more difficult. 
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3 Basic Approach 


We build the time-expanded graph from the timetable as follows. We group all departure 
and arrival events by the stop where they occur. We sort all events at a stop by time, 
merging events that happen at the same stop and time. We then add a vertex for each unique 
event, a waiting are between two consecutive events of the same stop, and a connection are 
for each connection (between the corresponding departure and arrival event). The cost of 
are (u,v) is time(u) — time(w), i. e., the time difference of the corresponding events. To 
account for footpaths between two stops a and b, we add, from each vertex at stop a, a foot 
are to the first reachable vertex at b (based on walking time), and vice versa. As events 
and vertices are tightly coupled in this model, we use the terms interchangeably. 

Any label generation scheme (we use RXL |9]) on the time-expanded graph creates 
two (forward and backward) event labels for every vertex (event), enabling event-to-event 
queries. For our application reachability labels 1211, which only store hubs (without 
distances), suffice. First, since all ares point to the future, time-expanded graphs are DAGs. 
Second, if an event e is reachable from another event e! (i. e., L/(e') n L&(e) / 0), we can 
compute the time to get from e' to e as time(e) — time(e / ). In fact, all paths between two 
events have equal cost. 

In practice, however, event-to-event queries are of limited use, as they require users to 
specify both departure and arrival times, one of which is usually unknown. Therefore, we 
discuss earliest arrival and prohle queries, which optimize arrival time and are thus more 
meaningful. See Section [5] for multicriteria queries. 

Earliest Arrival Queries. Given event labels, we answer an (s, t, t)-EA query as follows. 
We first find the earliest event e,(s) G E(s) at the source stop s that suits the departure 
time, i. e., with time(ej(s)) > r and time(ej_i(s)) < r. Next, we search at the target stop t 
for the earliest event ej(t) G E(t) that is reachable from ej(s) by testing whether Lf{ei(s)) fl 
Lb(ej(t)) 7 ^ 0 and Lf (ei(s)) n Lb{ej-\{t)) = 0. Then, time(ej(t)) is the earliest arrival 
time. One could find e-j(t) using linear search (which is simple and cache-friendly), but 
binary search is faster in theory and in practice. To accelerate queries, we prune (skip) all 
events e(t) with time(e(t)) < r, since Lf(ei(s)) 0 = 0 always holds in such cases. 

Moreover, to avoid evaluating Lf(ei(s )) multiple times, we use hash-based queries |9 : we 
first build a hash set of the hubs in Lf(ei(s)), then check the reachability for an event e(t) 
by probing the hash with hubs h G Lb(e(t)). 

Profile Queries. To answer an (s, tj-profile query, we perform a coordinated sweep over 
the events at s and t. For the current event ej(s) G E(s) at the source stop (initialized 
to the earliest event eo(s) G E(s)), we find the first event ej(t ) G E(t) at the target stop 
that is reachable, i. e., such that Lj(ei(s )) O Lb(ej(t)) / 0 and L/(ej(s)) O Lb(ej-i(t )) = 
0. This gives us the earliest arrival time time(ej(t)). To identify the latest departure 
time from s for that earliest arrival event (and thus have a tight journey), we increase i 
until Lf(ei(s )) O Lb(ej(t)) = 0, then add (time(ej_i(s)), time(e_j(i))) to the profile. We 
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repeat the process starting from the events e*(s) and e J+ i(t). Since we increase either i or j 
after each intersection test, the worst-case time to find all tight journeys is linear in the 
number of events (at s and t) multiplied by the size of their largest label. 

4 Leveraging Public Transit 

Our approach can be refined to exploit features specific to public transit networks. As 
described so far, our labeling scheme maintains reachability information for all pairs of 
events (by covering all paths of the time-expanded graph, breaking ties arbitrarily). However, 
in public transit networks we actually are only interested in certain paths. In particular, 
the labeling does not need to cover any path ending at a departure event (or beginning at 
an arrival event). We can thus discard forward labels from arrival events and backward 
labels from departure events. 

Trimmed Event Labels. Moreover, we can disregard paths representing dominated jour¬ 
neys that depart earlier and arrivé later than others (i. e., journeys that are not tight, 
cf. Section[2]). Consider all departure events of a stop. If a certain hub is reachable from 
event ej(s), then it is also reachable from eo(s),... ,ej_i(s), and is thus potentially added 
to the forward labels of all these earlier events. In fact, experiments show that on average 
the same hub is added to 1.8-5.0 events per stop (depending on the network). We therefore 
compute trimmed event labels by discarding all but the latest occurrence of each hub from 
the forward labels. Similarly, we only keep the earliest occurrence of each hub in the 
backward labels. (Preliminary experiments have shown that we obtain very similar label 
sizes with a much slower algorithm that greedily covers tight journeys explicitly [2,9 .) 

Unfortunately, we can no longer just apply the query algorithms from Section [3] with 
trimmed event labels: if the selected departure event at s does not correspond to a tight 
journey toward t, the algorithm will not find a solution (though one might exist). One 
could circumvent this issue by also running the algorithm from subsequent departure events 
at s, which however may lead to quadratic query complexity in the worst case (for both EA 
and profile queries). 

Stop Labels. We solve this problem by working with stop labels: For each stop p, we merge 
all forward event labels Lf (eo(p)),..., Lf(ek(p)) into a forward stop label SLf(p), and all 
backward event labels into a backward stop label SLf,(p). Similar to distance labels, each 
stop label SL(p ) is a list of pairs (h, time p (h)), each containing a hub and a time, sorted by 
hub. For a forward label, time p (h) encodes the latest departure time from p to reach hub h. 
More precisely, let h be a hub in an event label Lf(ei(p )): we add the pair (h, time(ej(p))) 
to the stop label SLf(p) only if h £ Lf(ej(p)),j > i, i.e., only if h does not appear in the 
label of another event with a later departure time at the stop. Analogously, for backward 
stop labels, time p (/i) encodes the earliest arrival time at p from h. 
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By restricting ourselves to these entries, we effectively discard dominated (non-tight) 
journeys to these hubs. It is easy to see that these stop labels obey a tight journey cover 
property: for each pair of stops s and t, SLf(s ) n SLb(t) contains at least one hub on each 
tight journey between them (or any equivalent journey that départs and arrivés at the same 
time; recall from Section [ 2 ] that we allow arbitrary tie-breaking). This property does not , 
however, imply that the label intersection only contains tight journeys: for example, SLf(s) 
and SLb(t ) could share a hub that is important for long distance travel, but not to get 
from s to t. The remainder of this section discusses how we handle this fact during queries. 

Stop Label Profile Queries. To run an (s,t)-profile query on stop labels, we perform 
a coordinated sweep over both labels SLf(s) and SLb(t). For every matching hub h, 
i.e., (h, time s (/r)) E SLf(s) and (h, time t (/i)) E SLb(t), we consider the journey induced 
by (time s (7i), time t (h)) for output. However, since we are only interested in reporting tight 
journeys, we maintain (during the algorithm) a tentative set of tight journeys, removing 
dominated journeys from it on-the-fly. (We found this to be faster than adding all journeys 
during the sweep and only discarding dominated journeys at the end.) We can further 
improve the efficiency of this approach in practice by (globally) reassigning hub IDs by the 
time of day. Note that every hub h of a stop label is still also an event and carries an event 
time time(h). (Not to be confused with time s (/i) and tim et(h).) We assign sequential IDs 
to all hubs h in order of increasing time(h), thus ensuring that hubs in the label intersection 
are enumerated chronologically. Note that this does not imply that journeys are enumerated 
in order of departure or arrival time, since each hub h may appear anywhere along its 
associated journey. However, preliminary experiments have shown that this approach leads 
to fewer insertions into the tentative set of tight journeys, reducing query time. Moreover, 
as in shortest path labels [9], we improve cache efficiency by storing the values for hubs and 
times separately in a stop label, accessing times only for matching hubs. 

Overall, stop and event labels have different trade-offs: maintaining the profile requires 
less effort with event labels (any discovered journey is already tight), but fewer hubs are 
scanned with stop labels (there are no duplicate hubs). 

Stop Label Earliest Arrival Queries. Reassigned hub IDs also enable fast (s, t, t)-EA 
queries. We use binary search in SLf(s) and SLb(t ) to find the earliest relevant hub h, i. e., 
with timef/i) > r. From there, we perform a linear coordinated sweep as in the profile 
query, hnding (h, time s (/r)) E SLf(s) and (h, timet(h)) E SLb(t). However, instead of 
maintaining tentative profile entries (time s (/i),time^h)), we ignore Solutions that depart 
too early (i.e., time s (/t) < r), while picking the hub h* that minimizes the tentative best 
arrival time time^Ii*). (Note that time(h) > r does not imply time s (h) > r.) Once we 
scan a hub h with time(h) > tim et(h*), the tentative best arrival time cannot be improved 
anymore, and we stop the query. For practical performance, pruning the scan, so that we 
only sweep hubs h between r < time(h) < time*(/i*), is very important. 
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5 Practical Extensions 


So far, we presented stop-tostop queries, which report the departure and arrival times of 
the quickest journey(s). In this section, we address multicriteria queries, general location- 
to-location requests, and obtaining detailed journey descriptions. 

Multicriteria Optimization and Minimum Transfer Time. Besides optimizing arrival 
time, many users also prefer journeys with fewer transfers. To solve the underlying 
multicriteria optimization problem, we adapt our labeling approach by (1) encoding transfers 
as are costs in the graph, (2) computing shortest path labels based on these costs (instead 
of reachability labels on an unweighted graph), and (3) adjusting the query algorithm to 
find the Pareto set of Solutions. 

Reconsider the earliest arrival graph from Section |3j As before, we add a vertex for 
each unique event, linking consecutive events at the same stop with waiting ares of cost 0. 
However, each connection are (u, w) in the graph is subdivided by an intermediate connection 
vertex v, setting the cost of are (u,v) to 0 and the cost of are (v,w) to 1. By interpreting 
costs of 1 as leaving a vehicle, we can count the number of trips taken along any path. To 
model staying in the vehicle, consecutive connection vertices of the same trip are linked by 
zero-cost ares. 

A shortest path labeling on this graph now encodes the number of transfers as the shortest 
path distance between two events, while the duration of the journey can still be deduced 
from the time difference of the events. Consider a üxed source event e(s) and the arrival 
events of a target stop eo(t), ei(t), ■.. in order of increasing time. The minimum number 
of transfers required to reach the target stop t never increases with arrival times. (Hence, 
the whole Pareto set P of multicriteria Solutions can be computed with a single Dijkstra 



We exploit this property to compute (s, t, r)-EA multicriteria (MC) queries from the 
labels as follows. We initialize P as the empty set. We then perform an (s, t, t)-EA 
query (with all optimizations described in Section [3j to compute the fastest journey in the 
solution, i.e., the one with most transfers. We add this journey to P. We then check (by 
performing distance label queries) for each subsequent event at t whether there is a journey 
with fewer transfers (than the most recently added entry of P), in which case we add the 
journey to P and repeat. The MC query ends once the last event at the target stop has 
been processed. We can stop earlier with the following optimization: we first run a distance 
label query on the last event at t to obtain the smallest possible number of transfers to 
travel from s to t. We may then already stop the MC query once we add a journey to P 
with this many transfers. Note that, since we do not need to check for domination in P 
explicitly, our algorithm maintains P in constant time per added journey. 

Minimum Transfer Times. Transit agencies often model an entire station with multiple 
platforms as a single stop and account for the time required to change trips inside the station 
by associating a minimum transfer time mtt(p) with each stop p. To incorporate them into 
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the EA graph, we first locally replace each affected stop p by a set of new stops p *, distributing 
conflicting trips (between which transferring is impossible due to mtt(p)) to different stops 
of p*. We then add footpaths between all pairs of stops in p* with length rntt(p). A small 
set p* can be computed by solving an appropriate coloring problem 110 . For the MC graph, 
we need not change the input. Instead, it is sufficiënt to shift each arrival event E(p) 
by adding rntt(p) to time(e) before creating the vertices. 

Location-to-Location Queries. A query between arbitrary locations s* and t*, which 
may employ walking or driving as the first and last legs of the journey, can be handled 
by a two-stage approach. It first computes sets S and T of relevant stops near the 
origin s* and destination t* that can be reached by car or on foot. With that information, 
a forward superlabel |1 is built from all forward stop labels associated with S. For 
each entry (h, time p (h)) G SLf(p) in the label of stop p G S, we adjust the departure 
time time*(/i) = time p (/i) — dist(s*,p) so that the journey starts at s* and add (h, time*(/i)) 
to the superlabel. For duplicate hubs that occur in multiple stop labels, we keep only the 
latest departure time from s*. This can be achieved with a coordinated sweep, always 
adding the next hub of minimum ID. A backward superlabel (for T) is built analogously. 
For location-to-location queries, we then simply run our stop-label-based EA and profile 
query algorithms using the superlabels. In practice, we need not build superlabels explicitly 
but can simulate the building sweep during the query (which in itself is a coordinated sweep 
over two labels). A similar approach is possible for event labels. Moreover, point-of-interest 
queries (such as finding the closest restaurants to a given location) can be computed by 
applying known techniques (T| to these superlabels. 

Journey Descriptions. While for many applications it suffices to report departure and 
arrival times (and possibly the number of transfers) per journey, sometimes a more detailed 
description is needed. We could apply known path unpacking techniques jT] to retrieve the 
full sequence of connections (and transfers), but in public transit it is usually enough to 
report the list of trips with associated transfer stops. We can accomplish that by storing 
with each hub the sequences of trips (and transfer stops) for travel between the hub and its 
label vertex. 


6 Experiments 

Setup. We implemented all algorithms in C++ using Visual Studio 2013 with full opti- 
mization. All experiments were conducted on a machine with two 8-core Intel Xeon E5-2690 
CPUs and 384 GiB of DDR3-1066 RAM, running Windows 2008R2 Server. All runs are 
sequential. We use at most 32 bits for distances. 

We consider four realistic inputs: the metropolitan networks of London (data.london 
gov.uk) and Madrid (emtmadrid.es), and the national networks of Sweden (trafiklab.se) 
and Switzerland (gtf s. geops. ch). London includes all modes of transport, Madrid contains 


Table 1. Size of timetables and the earliest arrival (EA) and multicriteria (MC) graphs. 


Instance 

Stops 

Conns 

Trips 

Footp. 

Dy. 

EA Graph 

m \a\ 

MC Graph 

M 14 

London 

20.8 k 

5,133 k 

133 k 

45.7k 

1 

4,719k 

51,043k 

9,852 k 

72,162 k 

Madrid 

4.7 k 

4,527 k 

165 k 

1.3k 

1 

3,003k 

13,730k 

7,530 k 

34,505 k 

Sweden 

51.Ik 

12,657 k 

548 k 

1.1 k 

2 

8,151 k 

34,806 k 

20,808 k 

93,194k 

Switzerland 

27.1 k 

23,706 k 

2,198k 

29.8k 

2 

7,979 k 

49,656 k 

31,685k 

170,503 k 


only buses, and the national networks contain both long-distance and local transit. We 
consider 24-hour timetables for the metropolitan networks, and two days for national ones 
(to enable overnight journeys). Footpaths were generated using a known heuristic jlO| for 
Madrid; they are part of the input for the other networks. See Table [T] for size figures of 
the timetables and resulting graphs. The average number of unique events per stop ranges 
from 160 for Sweden to 644 for Madrid. (Recall from Section [3] that we merge all coincident 
events at a stop.) Note that no two instances dominate each other (w. r. t. number of stops, 
connections, trips, events per stop, and footpaths). 

Preprocessing. Table [2] reports preprocessing figures for the unweighted earliest arrival 
graph (which also enables profile queries) and the multicriteria graph. For earliest ar¬ 
rival (EA), preprocessing takes well below an hour and generates about one gigabyte, which 
is quite practical. Although there are only 37-70 hubs per label, the total number of hubs 
per stop (i.e., the combined size of all labels) is quite large (5,630-49,247). By eliminating 
redundancy (cf. Section [4]), stop labels have only a fifth as many hubs (for Madrid). Even 
though they need to store an additional distance value per hub, total space usage is still 
smaller. In general, average labels sizes (though not total space) are higher for metropolitan 
instances. This correlates with the higher number of daily journeys in these networks. 


Table 2. Preprocessing figures. Label sizes are averages of forward and backward labels. 


Instance 



Earliest Arrival 



Multicriteria 


RXL 

[h:m] 

Event Labels 

Stop Labels 

RXL 

[h:m] 

Event Labels 

Hubs 

p.lbl 

Hubs 
p. stop 

Space 

[MiB] 

Hubs 
p. stop 

Space 

[MiB] 

Hubs 

p.lbl 

Hubs 
p. stop 

Space 

[MiB] 

London 

0:54 

70 

15,480 

1,334 

7,075 

1,257 

49:19 

734 

162,565 

26,871 

Madrid 

0:25 

77 

49,247 

963 

9,830 

403 

10:55 

404 

258,008 

10,155 

Sweden 

0:32 

37 

5,630 

1,226 

1,536 

700 

36:14 

190 

29,046 

12,637 

Switzerland 

0:42 

42 

11,189 

1,282 

2,970 

708 

61:36 

216 

58,022 

12,983 
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Table 3. Evaluating earliest arrival queries. Bullets (•) indicate different features: profile 
query (Prof.), stop labels (St.lbs.), pruning (Prn.), hashing (Hash), and binary searcli (Bin.). 
The column “=” indicates the average number of matched hubs. 


, ^ London Sweden Switzerland 

et' ^ & .V- - - 


p > p p.p' 

Lbls. 

Hubs 

= 

M 

Lbls. 

Hubs 

= 

\M 

Lbls. 

Hubs 

= 

\M 

o 

o 

o 

o 

o 

108.4 

6,936 

1 

14.7 

68.0 

2,415 

1 

6.9 

89.0 

3,485 

1 

8.7 

o 

o 

• 

o 

o 

16.1 

1,360 

1 

5.9 

34.4 

1,581 

1 

5.4 

33.5 

1,676 

1 

5.8 

o 

o 

• 

• 

o 

16.1 

1,047 

1 

4.2 

34.4 

1,083 

1 

3.6 

33.5 

1,151 

1 

3.8 

o 

o 

• 

• 

• 

7.0 

332 

4 

2.8 

6.5 

179 

3 

2.1 

7.6 

204 

4 

2.1 

o 

• 

o 

o 

o 

2.0 

13,037 

1,126 

54.8 

2.0 

2,855 

81 

10.0 

2.0 

5,707 

218 

20.4 

o 

• 

• 

o 

o 

2.0 

861 

62 

6.2 

2.0 

711 

16 

3.6 

2.0 

699 

19 

3.8 

• 

o 

o 

o 

o 

658.5 

40,892 

211 

141.7 

423.7 

13,590 

118 

39.4 

786.6 

29,381 

240 

81.4 

• 

• 

o 

o 

o 

2.0 

13,037 

1,126 

74.3 

2.0 

2,855 

81 

12.1 

2.0 

5,707 

218 

24.5 


Preprocessing the multicriteria (MC) graph is much more expensive: times increase by a 
factor of 26.2-54.8 for the metropolitan and 67.9-88 for the national networks. On Madrid, 
Sweden, and Switzerland labels are five times larger compared to EA, and on London the 
factor is even more than ten. This is immediately reflected in the space consumption, which 
is up to 26GiB (London). 

Queries. We now evaluate query performance. For each algorithm, we ran 100,000 
queries between random source and target stops, at random departure times between 0:00 
and 23:59 (of the first day). Table [3] reports detailed figures, organized in three blocks: 
event label EA queries, stop label EA queries, and profile queries (with both event and stop 
labels). We discuss MC queries later. 

We observe that event labels result in extremely fast EA queries (6.9-14.7ps), even 
without optimizations. As expected, pruning and hashing reduce the number of accesses to 
labels and hubs (see columns “Lbls.” and “Hubs”). Although binary search cannot stop as 
soon as a matching hub is found (see the “=” column), it accesses fewer labels and hubs, 
achieving query times below 3ps on all instances. 

Using stop labels (cf. Section [4]) in their basic form is significantly slower than using 
event labels. With pruning enabled, however, query times (3.6-6.2 ps) are within a factor 
of two of the event labels, while saving a factor of 1.1-2.4 in space. For profile queries, stop 
labels are clearly the best approach. It scans up to a factor of 5.1 fewer hubs and is up 
to 3.3 times faster, computing the profile of the full timetable period in under 80 ps on all 
instances. The difference in factors is due to the overhead of maintaining the Pareto set 
during the stop label query. 
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Table 4. Comparison with the state of the art. Presentation largely based on jöj, with 
some additional results taken from |ö]. The first block of techniques considers the EA 
problem, the second the MC problem and the third the profile problem. 


Algorithm 


Instance 



Criteria 

Prep. 

[h] 

Jn. 

Query 

[ms] 

Name 

Stops 

[-10 3 ] 

Conns 

[-10 6 ] 

Dy. 

A' ^ 


CSA 

12 


London 

20.8 

4.9 

1 

• 

O 

o 

— 

n/a 

1.8 

ACSi 

\ 

20 


Germany 

252.4 

46.2 

2 

• 

o 

o 

0.2 

n/a 

8.7 

CH 

14 



Europe (LD) 

30.5 

1.7 

P 

• 

o 

o 

<0.1 

n/a 

0.3 

TP E 



Madrid 

4.6 

4.8 

1 

• 

o 

o 

19 

n/a 

0.7 

TP ( 



Germany 

248.4 

13.9 

1 

• 

o 

o 

249 

0.9 

0.2 

PTL 



London 

20.8 

5.1 

1 

• 

o 

o 

0.9 

0.9 

0.0028 

PTL 



Madrid 

4.7 

4.5 

1 

• 

o 

o 

0.4 

0.9 

0.0030 

PTL 



Sweden 

51.1 

12.7 

2 

• 

o 

o 

0.5 

1.0 

0.0021 

PTL 



Switzerland 

27.1 

23.7 

2 

• 

o 

o 

0.7 

1.0 

0.0021 

RAP 

TOR 

11 

London 

20.8 

5.1 

1 

• 

• 

o 

— 

1.8 

5.4 

TP E 



Madrid 

4.6 

4.8 

1 

• 

• 

o 

185 

n/a 

3.1 

TP ( 



Germany 

248.4 

13.9 

1 

• 

• 

o 

372 

1.9 

0.3 

PTL 



London 

20.8 

5.1 

1 

• 

• 

o 

49.3 

1.8 

0.0266 

PTL 



Madrid 

4.7 

4.5 

1 

• 

• 

o 

10.9 

1.9 

0.0643 

PTL 



Sweden 

51.1 

12.7 

2 

• 

• 

o 

36.2 

1.7 

0.0276 

PTL 



Switzerland 

27.1 

23.7 

2 

• 

• 

o 

61.6 

1.7 

0.0217 

CSA 

12 


London 

20.8 

4.9 

1 

• 

o 

• 

— 

98.2 

161.0 

ACSi 

\ 

20 


Germany 

252.4 

46.2 

2 

• 

o 

• 

0.2 

n/a 

171.0 

CH 

14 



Europe (LD) 

30.5 

1.7 

P 

• 

o 

• 

<0.1 

n/a 

3.7 

TP 6 


Germany 

248.4 

13.9 

1 

• 

o 

• 

249 

16.4 

3.3 

PTL 



London 

20.8 

5.1 

1 

• 

o 

• 

0.9 

81.0 

0.0743 

PTL 



Madrid 

4.7 

4.5 

1 

• 

o 

• 

0.4 

110.7 

0.1119 

PTL 



Sweden 

51.1 

12.7 

2 

• 

o 

• 

0.5 

12.7 

0.0121 

PTL 



Switzerland 

27.1 

23.7 

2 

• 

o 

• 

0.7 

31.5 

0.0245 
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Comparison. Table [4] compares our new algorithm (indicated as PTL , for Public Transit 
Labeling) to the state of the art and also evaluates multicriteria queries. In this experiment, 
PTL uses event labels with pruning, hashing and binary search for earliest arrival (and 
multicriteria) queries, and stop labels for profile queries. We compare PTL to CSA 112 
and RAPTOR 1111 (currently the fastest algorithms without preprocessing), as well as 
Accelerated CSA (ACSA) |20|, Timetable Contraction Hierarchies (CH) (Ï4j, and Transfer 
Patterns (TP) [Ijö] (which make use of preprocessing). Since RAPTOR always optimizes 
transfers (by design), we only include it for the MC problem. Note that the following 
evaluation should be taken with a grain of salt, as no standardized benchmark instances 
exist, and many data sets used in the literature are proprietary. Although precise numbers 
are not available for several competing methods, it is safe to say they use less space than 
PTL, particularly for the MC problem. 

Table [4] shows that PTL queries are very efficiënt. Remarkably, they are faster on the 
national networks than on the metropolitan ones: the latter are smaller in most aspects, 
but have more frequent journeys (that must be covered). Compared to other methods, PTL 
is 2-3 orders of magnitude faster on London than CSA and RAPTOR for EA (factor 643), 
profile (factor 2,167), and MC (factor 203) queries. We note, however, that PTL is a 
point-to-point algorithm (as are ACSA, TP, and CH); for one-to-all queries, CSA and 
RAPTOR would be faster. 

PTL has 1-2 orders of magnitude faster preprocessing and queries than TP for the EA 
and profile problems. On Madrid, EA queries are 233 times faster while preprocessing is 
faster by a factor of 48. Note that Sweden (PTL) and Germany (TP) have a similar number 
of connections, but PTL queries are 95 times faster. (Germany does have more stops, but 
recall that PTL query performance depends more on the frequency of trips.) For the MC 
problem, the difference is smaller, but both preprocessing and queries of PTL are still an 
order of magnitude faster than TP (up to 48 times for MC queries on Madrid). 

Compared to ACSA and CH (for which figures are only available for the EA and 
profile problems), PTL has slower preprocessing but signihcantly faster queries (even when 
accounting for different network sizes). 


7 Conclusion 

We introduced PTL, a new preprocessing-based algorithm for journey planning in public 
transit networks, by revisiting the time-expanded model and adapting the Hub Labeling 
approach to it. By further exploiting structural properties specific to timetables, we 
obtained simple and efficiënt algorithms that outperform the current state of the art on large 
metropolitan and country-sized networks by orders of magnitude for various realistic query 
types. Future work includes developing tailored algorithms for hub computation (instead 
of using RXL as a black box), compressing the labels (e.g., using techniques from [6] 
and |9]), exploring other hub representations (e.g., using trips instead of events, as in 3-hop 
labeling |21 ), using multicore- and instruction-based parallelism for preprocessing and 
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queries, and handling dynamic scenarios (e. g., temporary station closures and train delays 
or cancellations i)- 
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